Goto

Collaborating Authors

 diverse solution


Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian

Neural Information Processing Systems

Over the last decade, a single algorithm has changed many facets of our lives - Stochastic Gradient Descent (SGD). In the era of ever decreasing loss functions, SGD and its various offspring have become the go-to optimization tool in machine learning and are a key component of the success of deep neural networks (DNNs). While SGD is guaranteed to converge to a local optimum (under loose assumptions), in some cases it may matter which local optimum is found, and this is often context-dependent. Examples frequently arise in machine learning, from shape-versus-texture-features to ensemble methods and zero-shot coordination. In these settings, there are desired solutions which SGD on easy' solutions. In this paper, we present a different approach. Rather than following the gradient, which corresponds to a locally greedy direction, we instead follow the eigenvectors of the Hessian. By iteratively following and branching amongst the ridges, we effectively span the loss surface to find qualitatively different solutions. We show both theoretically and experimentally that our method, called Ridge Rider (RR), offers a promising direction for a variety of challenging problems.



Trust Region-Based Bayesian Optimisation to Discover Diverse Solutions

Perera, Kokila Kasuni, Neumann, Frank, Neumann, Aneta

arXiv.org Artificial Intelligence

Bayesian optimisation (BO) is a surrogate-based optimisation technique that efficiently solves expensive black-box functions with small evaluation budgets. Recent studies consider trust regions to improve the scalability of BO approaches when the problem space scales to more dimensions. Motivated by this research, we explore the effectiveness of trust region-based BO algorithms for diversity optimisation in different dimensional black box problems. We propose diversity optimisation approaches extending TuRBO1, which is the first BO method that uses a trust region-based approach for scalability. We extend TuRBO1 as divTuRBO1, which finds an optimal solution while maintaining a given distance threshold relative to a reference solution set. We propose two approaches to find diverse solutions for black-box functions by combining divTuRBO1 runs in a sequential and an interleaving fashion. We conduct experimental investigations on the proposed algorithms and compare their performance with that of the baseline method, ROBOT (rank-ordered Bayesian optimisation with trust regions). We evaluate proposed algorithms on benchmark functions with dimensions 2 to 20. Experimental investigations demonstrate that the proposed methods perform well, particularly in larger dimensions, even with a limited evaluation budget.


M-Best-Diverse Labelings for Submodular Energies and Beyond

Alexander Kirillov, Dmytro Shlezinger, Dmitry P. Vetrov, Carsten Rother, Bogdan Savchynskyy

Neural Information Processing Systems

We consider the problem of finding M best diverse solutions of energy minimization problems for graphical models. Contrary to the sequential method of Batra et al., which greedily finds one solution after another, we infer all M solutions jointly. It was shown recently that such jointly inferred labelings not only have smaller total energy but also qualitatively outperform the sequentially obtained ones. The only obstacle for using this new technique is the complexity of the corresponding inference problem, since it is considerably slower algorithm than the method of Batra et al. In this work we show that the joint inference of M best diverse solutions can be formulated as a submodular energy minimization if the original MAP-inference problem is submodular, hence fast inference techniques can be used. In addition to the theoretical results we provide practical algorithms that outperform the current state-of-the-art and can be used in both submodular and non-submodular case.


Review for NeurIPS paper: Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian

Neural Information Processing Systems

The problem itself involves an extremely wide interest group across all areas of ML/AI. The outcome and the insights from this work can be applied to any problem in AI where non-convex optimization is involved. For instance Theorem 1, provides a very important theoretical result that is essential towards realizing why the approach might work. 5. The approximate version of the RR algorithm is also a a very nice addition, given that for larger sample spaces eigen decomposition of Hessian computation can be prohibitive and thus updates using Hessian-eigenvector products is a very nice touch.


Review for NeurIPS paper: Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian

Neural Information Processing Systems

This paper proposes Ridge Regression, a novel method for exploring local minima using eigenvectors of Hessian. The novelty of the method was appreciated and it was also noted that the method can connect to classic methods such as Branch and Bound technique. There was also appreciation of the empirical results. However, the lack of mathematical insights related to diversity of solutions, --a key claim of the paper-- was missed. Even if there was no analysis a comparison with the random orthonormal set of descent directions in producing diverse directions would have been illustrative to understand the power of the method.


Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian

Neural Information Processing Systems

Over the last decade, a single algorithm has changed many facets of our lives - Stochastic Gradient Descent (SGD). In the era of ever decreasing loss functions, SGD and its various offspring have become the go-to optimization tool in machine learning and are a key component of the success of deep neural networks (DNNs). While SGD is guaranteed to converge to a local optimum (under loose assumptions), in some cases it may matter which local optimum is found, and this is often context-dependent. Examples frequently arise in machine learning, from shape-versus-texture-features to ensemble methods and zero-shot coordination. In these settings, there are desired solutions which SGD on standard' loss functions will not find, since it instead converges to theeasy' solutions.


M-Best-Diverse Labelings for Submodular Energies and Beyond Alexander Kirillov 1 Dmitrij Schlesinger

Neural Information Processing Systems

We consider the problem of finding M best diverse solutions of energy minimization problems for graphical models. Contrary to the sequential method of Batra et al., which greedily finds one solution after another, we infer all M solutions jointly. It was shown recently that such jointly inferred labelings not only have smaller total energy but also qualitatively outperform the sequentially obtained ones. The only obstacle for using this new technique is the complexity of the corresponding inference problem, since it is considerably slower algorithm than the method of Batra et al. In this work we show that the joint inference of M best diverse solutions can be formulated as a submodular energy minimization if the original MAP-inference problem is submodular, hence fast inference techniques can be used. In addition to the theoretical results we provide practical algorithms that outperform the current state-of-the-art and can be used in both submodular and non-submodular case.


Many-Objective Multi-Solution Transport

Li, Ziyue, Li, Tian, Smith, Virginia, Bilmes, Jeff, Zhou, Tianyi

arXiv.org Artificial Intelligence

Optimizing the performance of many objectives (instantiated by tasks or clients) jointly with a few Pareto stationary solutions (models) is critical in machine learning. However, previous multi-objective optimization methods often focus on a few number of objectives and cannot scale to many objectives that outnumber the solutions, leading to either subpar performance or ignored objectives. We introduce Many-objective multi-solution Transport (MosT), a framework that finds multiple diverse solutions in the Pareto front of many objectives. Our insight is to seek multiple solutions, each performing as a domain expert and focusing on a specific subset of objectives while collectively covering all of them. MosT formulates the problem as a bi-level optimization of weighted objectives for each solution, where the weights are defined by an optimal transport between the objectives and solutions. Our algorithm ensures convergence to Pareto stationary solutions for complementary subsets of objectives. On a range of applications in federated learning, multi-task learning, and mixture-of-prompt learning for LLMs, MosT distinctly outperforms strong baselines, delivering high-quality, diverse solutions that profile the entire Pareto frontier, thus ensuring balanced trade-offs across many objectives.


Continuous Tensor Relaxation for Finding Diverse Solutions in Combinatorial Optimization Problems

Ichikawa, Yuma, Iwashita, Hiroaki

arXiv.org Artificial Intelligence

Finding the best solution is the most common objective in combinatorial optimization (CO) problems. However, a single solution may not be suitable in practical scenarios, as the objective functions and constraints are only approximations of original real-world situations. To tackle this, finding (i) "heterogeneous solutions", diverse solutions with distinct characteristics, and (ii) "penalty-diversified solutions", variations in constraint severity, are natural directions. This strategy provides the flexibility to select a suitable solution during post-processing. However, discovering these diverse solutions is more challenging than identifying a single solution. To overcome this challenge, this study introduces Continual Tensor Relaxation Annealing (CTRA) for unsupervised-learning-based CO solvers. CTRA addresses various problems simultaneously by extending the continual relaxation approach, which transforms discrete decision variables into continual tensors. This method finds heterogeneous and penalty-diversified solutions through mutual interactions, where the choice of one solution affects the other choices. Numerical experiments show that CTRA enables UL-based solvers to find heterogeneous and penalty-diversified solutions much faster than existing UL-based solvers. Moreover, these experiments reveal that CTRA enhances the exploration ability.